17 research outputs found

    DDR: efficient computational method to predict drug-target interactions using graph mining and machine learning approaches.

    Get PDF
    Motivation: Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer the high false positive prediction rate. Results: We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 34% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs. Availability and implementation: The data and code are provided at https://bitbucket.org/RSO24/ddr/. Contact: [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online

    Long-term serotonin abnormalities in the brain of immature rats subjected to febrile seizures

    Get PDF
    Objective(s): Febrile seizures (FS) are the most common neurological disorder at a young age in humans. Animal models of hyperthermia-induced seizures provide a tool to investigate the underlying mechanisms of FS related to epilepsy development and its co-morbidities. The present study investigates the alterations in monoamine neurotransmitters in two brain areas: the cortex and the hippo-campus in animals subjected to prolonged FS at their immature age. Materials and Methods: Experimental animals were divided into three groups: cage-control group (NHT-NFS), positive hyperthermic control group (HT-NFS), and the hyperthermia-induced febrile seizure group (HT-FS). Each group was further subdivided into young (Y) and adult (A) groups. Results: There were significant changes in the cortical and hippocampal serotonin neurotransmitters that were persistent until adulthood. However, the changes in the two other neurotransmitters, norepinephrine and dopamine, were transient and have been recovered in adulthood. Conclusion: The present study sheds more light on the importance of monoamine neurotransmitters in epileptogenesis following FS

    epihet for intra-tumoral epigenetic heterogeneity analysis and visualization.

    Get PDF
    Intra-tumoral epigenetic heterogeneity is an indicator of tumor population fitness and is linked to the deregulation of transcription. However, there is no published computational tool to automate the measurement of intra-tumoral epigenetic allelic heterogeneity. We developed an R/Bioconductor package, epihet, to calculate the intra-tumoral epigenetic heterogeneity and to perform differential epigenetic heterogeneity analysis. Furthermore, epihet can implement a biological network analysis workflow for transforming cancer-specific differential epigenetic heterogeneity loci into cancer-related biological function and clinical biomarkers. Finally, we demonstrated epihet utility on acute myeloid leukemia. We found statistically significant differential epigenetic heterogeneity (DEH) loci compared to normal controls and constructed co-epigenetic heterogeneity network and modules. epihet is available at https://bioconductor.org/packages/release/bioc/html/epihet.html

    Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data.

    Get PDF
    Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) sub-compartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by in-situ Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub-compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training

    An integrated expression atlas of miRNAs and their promoters in human and mouse

    Get PDF
    MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    A Rare Presentation of Inflammatory Myofibroblastic Tumor in the Nasolabial Fold

    No full text
    Inflammatory myofibroblastic tumor (IMT) is a benign lesion that occurs most frequently in the soft tissues and viscera. In the head and neck region, the tumor has been reported to occur in the orbit, tongue, nasopharynx, larynx, and paranasal sinuses and the central nervous system. Despite being a benign lesion, it exhibits infiltrative and destructive behaviours, making histopathological examination necessary to confirm the diagnosis. We report the case of a 38-year-old female presented with a right nasolabial fold mass, which was confirmed histologically to be an IMT. Surgical excision of the mass was achieved through a sublabial approach with an uneventful postoperative period. To the best of our knowledge, this is the first reported case of an IMT in the nasolabial fold

    Plasma protein biomarkers for early prediction of lung cancerResearch in context

    No full text
    Summary: Background: Individual plasma proteins have been identified as minimally invasive biomarkers for lung cancer diagnosis with potential utility in early detection. Plasma proteomes provide insight into contributing biological factors; we investigated their potential for future lung cancer prediction. Methods: The Olink® Explore-3072 platform quantitated 2941 proteins in 496 Liverpool Lung Project plasma samples, including 131 cases taken 1–10 years prior to diagnosis, 237 controls, and 90 subjects at multiple times. 1112 proteins significantly associated with haemolysis were excluded. Feature selection with bootstrapping identified differentially expressed proteins, subsequently modelled for lung cancer prediction and validated in UK Biobank data. Findings: For samples 1–3 years pre-diagnosis, 240 proteins were significantly different in cases; for 1–5 year samples, 117 of these and 150 further proteins were identified, mapping to significantly different pathways. Four machine learning algorithms gave median AUCs of 0.76–0.90 and 0.73–0.83 for the 1–3 year and 1–5 year proteins respectively. External validation gave AUCs of 0.75 (1–3 year) and 0.69 (1–5 year), with AUC 0.7 up to 12 years prior to diagnosis. The models were independent of age, smoking duration, cancer histology and the presence of COPD. Interpretation: The plasma proteome provides biomarkers which may be used to identify those at greatest risk of lung cancer. The proteins and the pathways are different when lung cancer is more imminent, indicating that both biomarkers of inherent risk and biomarkers associated with presence of early lung cancer may be identified. Funding: Janssen Pharmaceuticals Research Collaboration Award; Roy Castle Lung Cancer Foundation

    A novel method for improved accuracy of transcription factor binding site prediction

    No full text
    Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF

    Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle

    No full text
    Despite half a century of research, the biology of dinoflagellates remains enigmatic: they defy many functional and genetic traits attributed to typical eukaryotic cells. Genomic approaches to study dinoflagellates are often stymied due to their large, multi-gigabase genomes. Members of the genus Symbiodinium are photosynthetic endosymbionts of stony corals that provide the foundation of coral reef ecosystems. Their smaller genome sizes provide an opportunity to interrogate evolution and functionality of dinoflagellate genomes and endosymbiosis. We sequenced the genome of the ancestral Symbiodinium microadriaticum and compared it to the genomes of the more derived Symbiodinium minutum and Symbiodinium kawagutii and eukaryote model systems as well as transcriptomes from other dinoflagellates. Comparative analyses of genome and transcriptome protein sets show that all dinoflagellates, not only Symbiodinium, possess significantly more transmembrane transporters involved in the exchange of amino acids, lipids, and glycerol than other eukaryotes. Importantly, we find that only Symbiodinium harbor an extensive transporter repertoire associated with the provisioning of carbon and nitrogen. Analyses of these transporters show species-specific expansions, which provides a genomic basis to explain differential compatibilities to an array of hosts and environments, and highlights the putative importance of gene duplications as an evolutionary mechanism in dinoflagellates and Symbiodinium.publishe
    corecore